Tuning multiple imputation by predictive mean matching and local residual draws
نویسندگان
چکیده
BACKGROUND Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor's residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified. METHODS We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified. RESULTS In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations. CONCLUSIONS PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.
منابع مشابه
Author ' s response to reviews Title : Tuning multiple imputation by predictive mean matching and local residual
متن کامل
چند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملFlexible and Robust Method for Missing Loop Detector Data Imputation
1 This work is primarily focused on missing traffic sensor data imputation for the purpose of improving the 2 coverage and accuracy of traffic analysis and performance estimation. Missing data, whether attributable 3 to hardware failure or error detection and removal, is a constant problem in loop and other traffic detector 4 datasets. As the rate of missingness increases, the treatment of miss...
متن کاملMultiple Imputation to Correct for Nonresponse Bias: Application in Non-communicable Disease Risk Factors Survey.
BACKGROUND This study was carried out to use multiple imputation (MI) in order to correct for the potential nonresponse bias in measurements related to variable fasting blood glucose (FBS) in non-communicable disease risk factors survey conducted in Iran in 2007. METHODS Five multiple imputation methods as bootstrap expectation maximization, multivariate normal regression, univariate linear r...
متن کاملComparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
BACKGROUND The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 14 شماره
صفحات -
تاریخ انتشار 2014